{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python 机器学习实战 ——代码样例\n",
    "\n",
    "# 第十二章 K 近邻算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## 欧氏距离计算\n",
    "\n",
    "首先，构造两个样本点："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "x=np.array([1,1])\n",
    "y=np.array([4,5])   \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "计算上述两个样本点之间的欧式距离："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.0\n"
     ]
    }
   ],
   "source": [
    "from math import *\n",
    "def e_distance(x,y):\n",
    "    return sqrt(sum(pow(a-b,2) for a, b in zip(x, y)))\n",
    "    \n",
    "print(e_distance(x, y))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 曼哈顿距离计算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "仍然使用上述的两个样本点，计算它们之间的曼哈顿距离："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7\n"
     ]
    }
   ],
   "source": [
    "from math import *\n",
    "def m_distance(x,y):\n",
    "    return sum(abs(x-y))\n",
    "    \n",
    "print(m_distance(x, y))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 切比雪夫距离计算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算两点之间的切比雪夫距离："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n"
     ]
    }
   ],
   "source": [
    "from math import *\n",
    "def q_distance(x,y):\n",
    "    return abs(x-y).max()\n",
    "    \n",
    "print(q_distance(x, y))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 夹角余弦距离计算"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算两点之间的夹角余弦距离："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.993883734674\n"
     ]
    }
   ],
   "source": [
    "from math import *\n",
    "def cos_distance(x,y):\n",
    "    return np.dot(x,y)/(np.linalg.norm(x)*np.linalg.norm(y))\n",
    "    \n",
    "print(cos_distance(x, y))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用K近邻算法对 Iris 数据集进行分类"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本节我们将通过一个例子讲解 K 近邻对 Iris 数据集进行分类。Iris 数据集是一个常用的分类用数据集，以鸢尾花的特征作为数据来源，数据集包含 150 个样本，分为 3 类花种，每类 50 个样本，每个样本包含 4 个独立属性 ( 萼片长度、萼片宽度、花瓣长度、花瓣宽度 )。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 首先，我们导入将用到的库。\n",
    "\n",
    "from sklearn import datasets\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn import cross_validation\n",
    "from sklearn.metrics import classification_report\n",
    "from sklearn.metrics import confusion_matrix\n",
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "\n",
    "# 准备数据集，并分离训练集和验证集。\n",
    "\n",
    "iris = datasets.load_iris()  \n",
    "X = iris.data  \n",
    "Y = iris.target \n",
    "validation_size = 0.20\n",
    "seed = 1   \n",
    "X_train, X_validation, Y_train, Y_validation = cross_validation.train_test_split(X, Y, test_size=validation_size, random_state=seed) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy_score: 1.0\n",
      "混淆矩阵： \n",
      " [[11  0  0]\n",
      " [ 0 13  0]\n",
      " [ 0  0  6]]\n",
      "分类报告： \n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "          0       1.00      1.00      1.00        11\n",
      "          1       1.00      1.00      1.00        13\n",
      "          2       1.00      1.00      1.00         6\n",
      "\n",
      "avg / total       1.00      1.00      1.00        30\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 创建 KNN 分类器，并拟合数据集。\n",
    "\n",
    "knn = KNeighborsClassifier()\n",
    "knn.fit(X_train, Y_train)\n",
    "\n",
    "# 在验证集上进行预测，并输出 accuracy score，混淆矩阵和分类报告。\n",
    "\n",
    "predictions = knn.predict(X_validation)\n",
    "print('accuracy_score:',accuracy_score(Y_validation, predictions))\n",
    "print('混淆矩阵： \\n',confusion_matrix(Y_validation, predictions))\n",
    "print('分类报告： \\n',classification_report(Y_validation, predictions))\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
